# **Chapter 4: Input/Output System**

Chapter 4.1 Dept. of Comp. Arch., UMA, 2018



## **Input and Output Devices**

- □ I/O devices are incredibly diverse with respect to
  - Behavior input, output or storage
  - Partner human or machine
  - Data rate the peak rate at which data can be transferred between the I/O device and the main memory or processor

| Device           | Behavior        | Partner | Data rate (Mb/s)        |
|------------------|-----------------|---------|-------------------------|
| Keyboard         | input           | human   | 0.0001                  |
| Mouse            | input           | human   | 0.0038                  |
| Laser printer    | output          | human   | 3.2000                  |
| Magnetic disk    | storage         | machine | 800.0000-3000.0000      |
| Graphics display | output          | human   | 800.0000-8000.0000      |
| Network/LAN      | input or output | machine | 100.0000-<br>10000.0000 |

orders of magnitude

Chapter 4.3

Dept. of Comp. Arch., UMA, 2018

### I/O Performance Measures

- □ I/O bandwidth (throughput) amount of information that can be input (output) and communicated across an interconnect (e.g., a bus) to the processor/memory (I/O device) per unit time
  - 1. How much data can we move through the system in a certain time?
  - 2. How many I/O operations can we do per unit time?
- □ I/O response time (latency) the total elapsed time to accomplish an input or output operation
  - An especially important performance metric in real-time systems
- Expandability is there any easy way to connect another disk to the system?
- Resilience if this I/O controller (network) fails, is it going to affect the rest of the network?

Chapter 4.4

# **Input/output Ports**

Chapter 4.6

Dept. of Comp. Arch., UMA, 2018

## I/O ports

- □ Communication between CPU and I/O devices
  - How does the processor communicate with devices other than main memory?
    - By using Input/output ports
  - I/O ports
    - Input port: transfers from external device to CPU
    - Output port: transfers from CPU to external
    - Input/Output ports: transfers in both directions

Chapter 4.7



## **I/O Commands**

- □ I/O devices are managed by I/O controller hardware
  - Transfers data to/from device
  - Synchronizes operations with software
- □ Ports in a I/O controller:
  - Command registers
    - Cause device to do something
  - Status registers
    - Indicate what the device is doing and occurrence of errors
  - Data registers
    - Write: transfer data to a device
    - Read: transfer data from a device

Chapter 4.9



- □ User programs (processor in protected mode) are prevented from issuing I/O operations directly because the OS does not provide access to the I/O ports
- Only when processor is in kernel (supervisor) mode, then the I/O ports can be accessed
- □ How the processor directs the I/O devices: through the address space
  - Memory-mapped I/O
  - 2. Specific I/O instructions

Chapter 4.11

- □ How the processor directs the I/O devices
  - 1. Memory-mapped I/O
    - Portions of the high-order memory address space are assigned to each I/O device
    - Read and writes to those memory addresses are interpreted as commands to the I/O devices
    - Load/stores to the I/O address space can only be done by the OS
    - MIPS processor:
      - Load instruction to read from I/O device
        - » i.e. lw \$4, 100(\$5)
      - Store instruction to write to I/O device
        - » i.e. sw \$4, 100(\$5)

Chapter 4.12 Dept. of Comp. Arch., UMA, 2018



- □ How the processor directs the I/O devices
  - 2. I/O instructions
    - Separate instructions to access I/O registers
    - Can only be executed in kernel mode
    - Example: x86:
      - Control signal:  $\overline{IO}/M$  (1= memory, 0 = IO)
      - Specific Input instruction to read from I/O device
        - » i.e. in al, 37h
          - »  $P(37) \rightarrow AL$  register
      - Specific Output instruction to write to I/O device
        - » i.e. out 37h, al
          - » AL register  $\rightarrow$  P(37)

Chapter 4.14 Dept. of Comp. Arch., UMA, 2018



## Raspberry Pi

- Raspberrypi 2 B: a 900MHz quad-core ARM Cortex-A7 CPU 1GB RAM
- □ Like the (Pi 1) Model B+, it also has: 100 Base Ethernet, 4 USB ports, 40 GPIO pins, full HDMI port, combined 3.5mm audio jack and composite video, camera interface (CSI), display interface (DSI), micro SD card slot, videoCore IV 3D graphics core.



Chapter 4.16 Dept. of Comp. Arch., UMA, 2018

## Raspberry Pi: GPIO port





- □ Raspberry Pi manages up to 54 pins
- Only the showed ones are accessible
- □ GPIO ports are mapped in memory, starting at 0x3F200000

Chapter 4.17 Dept. of Comp. Arch., UMA, 2018





### **GPIO** memory mapping

#### GPIO ports

- GPFSELn: GPIO Function Select Registers
  - The 54 pins are configured through 6 memory ports, GPSEL0 to GPSEL5
  - Each port defines 10 groups named FSELx, FSEL0 to FSEL9
  - A group consists of 3 bits
  - GPFSEL0 controls GPIO0 to GPIO9, GPFSEL1 controls GPIO10 to GPIO19, ...
- GPSETn: GPIO Pin Output Set Registers
  - GPSET0 sets pins 0 to 31, and GPSET1 sets pins 32 to 53
- GPCLRn: GPIO Pin Output Clear Registers
  - GPCLR0 clears pins 0 to 31, and GPCLR1 clears pins 32 to 53
- A SET or CLR operation in any pin just needs 1 in the corresponding position and only affects that pin (0 means that the pin is not modified)

Chapter 4.20 Dept. of Comp. Arch., UMA, 2018













### **Example 1** Code for turning on a red LED (GPIO9): GPBASE, 0x3F200000 .set GPFSEL0, 0x00 .set .set GPSETO, 0x1c .text ldr r0, =GPBASE /\* guia bits xx999888777666555444333222111000\*/ str r1, [r0, #GPFSEL0] @ Configura GPIO 9 10987654321098765432109876543210\*/ r1, [r0, #GPSET0] @ Enciende GPIO 9 infi infi: b Dept. of Comp. Arch., UMA, 2018 Chapter 4.27



- □ How I/O devices communicate with the processor
  - Polling the processor periodically checks the status of an I/O device (through the OS) to determine its need for service
    - Processor is totally in control but does all the work
    - In real-time embedded applications:
      - I/O rates are predetermined and it makes I/O overhead predictable (helpful for real time)
    - Can waste a lot of processor time due to speed differences
  - Interrupt-driven I/O the I/O device issues an interrupt to indicate that
    it needs attention
    - Advantages of using interrupts
      - Relieves the processor from having to continuously poll for an I/O event; user program progress is only suspended during the actual transfer of I/O data to/from user memory space
    - Disadvantage special hardware is needed to
      - Indicate the I/O device causing the interrupt and to save the necessary information prior to servicing the interrupt and to resume normal processing after servicing the interrupt

Chapter 4.29

## **Polling**

- □ Periodically check I/O status register
  - If device ready, do operation
  - If error, take action
- Common in small or low-performance real-time embedded systems
  - Predictable timing
  - Low hardware cost
- □ In other systems, wastes CPU time

Chapter 4.30









## **Example 2**

Code for checking push button (GPIO2) and turning led on:

Chapter 4.35





## **Example 3: red LED blinking**

- We must:
  - Configure GPIO9
  - 2. Turn the led on
  - 3. Wait some time
  - 4. Turn the led off
  - 5. Wait some time
  - 6. Repeat steps 2-5 forever
- We need:
  - A routine that "waits"

Chapter 4.38

Dept. of Comp. Arch., UMA, 2018

## **Example 3: red LED blinking**

```
.set GPBASE, 0x3F200000
.set GPFSEL0, 0x00
.set GPSET0, 0x1c
.set GPCLR0, 0x28
.set STBASE, 0x3F003000
.set STCLO, 0x04

GPIO9 configuration, and turning LED on and off

Timer read
```

Our "waiting" routine will:

- 1. Read waiting time (input parameter)
- 2. Repeat

Read current timer value while it is lower than waiting time

Chapter 4.39

## **Example 3: red LED blinking**

```
.set GPEASE, 0x3F200000
.set GPFSELO, 0x00
.set GPSETO, 0x1c
.set GPCIRO, 0x28
.set SIBASE, 0x3F003000
set SICIO, 0x04
```

#### Routine implementation:

- · We can use registers r0 and r1 as input parameters
  - r0 contains the timer port address
  - · r1 contains the waiting time
- · We must preserve registers modified inside our routine
  - · r4 contains the ending time
  - r5 loads current timer value

```
@ Save r4 and r5 in the stack
espera: push {r4, r5}
       ldr r4, [r0, #STCLO]
                                  @ Load CLO timer
                                  @ Add waiting time -> this is our ending
       add r4, r1
time
ret1:
       ldr r5, [r0, #STCLO]
                                  @ Enter waiting loop: load current CLO
timer
        cmp r5, r4
                                  @ Compare current time with ending time
       blo ret1
                                  @ If lower, go back to read timer again
       pop {r4, r5}
                                  @ Restore r4 and r5
                                  @ Return from routine
```

Chapter 4.40 Dept. of Comp. Arch., UMA, 2018

## **Example 3: red LED blinking**

```
.set GPBASE, 0x3F200000
.set GPFSELO, 0x00
.set GPSETO, 0x1c
.set GPCLRO, 0x28
.set STEASE, 0x3F003000
.set STCLO, 0x04
```

#### Our main program must:

- · Init the stack
- Configure GPIO9
- Init timer access (r0) and waiting time (r1) parameters
- · Turn the led on and off, and call "waiting" routine between them

```
espera: push {r4, r5}
                                  @ Save r4 and r5 in the stack
       ldr r4, [r0, #STCLO]
                                  @ Load CLO timer
       add r4, r1
                                  @ Add waiting time -> this is our ending
time
       ldr r5, [r0, #STCLO]
                                  @ Enter waiting loop: load current CLO
ret1:
timer
        cmp r5, r4
                                  @ Compare current time with ending time
       blo ret1
                                  @ If lower, go back to read timer again
       pop {r4, r5}
                                  @ Restore r4 and r5
       bx lr
                                  @ Return from routine
```

Chapter 4.41 Dept. of Comp. Arch., UMA, 2018



## **Example 3: red LED blinking**

```
.set GPBASE, 0x3F200000
            .set GPFSELO, 0x00
            .set GPSETO, Oxlc
            .set GPCLRO, 0x28
            .set STBASE, 0x3F003000
            .set STCLO, 0x04
            mov r0, #0b11010011
           msr cpsr_c, r0
                               @ SVC mode enabled
           mov sp, #0x08000000 @ Init stack in SVC mode
            ldr r4, =GPBASE
           @ Configure GPI09
            @ r0 is an input parameter (ST base address)
            ldr r0, =STBASE
           ldr r1, =500000
                               @ rl is an input parameter (waiting time in microseconds)
   bucle:
           bl espera
                               @ Call waiting routine
           str r5, [r4, #GPSET0]
                                        @ Turn LED on
                               @ Call waiting routine
           bl espera
            str r5, [r4, #GPCLR0]
                                        @ Turn LED off
           b bucle
           push {r4, r5}
                               @ Save r4 and r5 in the stack
   espera:
            ldr r4, [r0, #STCLO] @ Load CLO timer
                              @ Add waiting time -> this is our ending time
           add r4, r1
            ldr r5, [r0, #STCLO] @ Enter waiting loop: load current CLO timer
   ret1:
            cmp r5, r4
                               @ Compare current time with ending time
                               @ If lower, go back to read timer again
@ Restore r4 and r5
           blo ret1
           pop {r4, r5}
                               @ Return from routine
                                                                     Dept. of Comp. Arch., UMA, 2018
Chapter 4.43
```



# Example 4: sound generation .set GPRASE, 0x3F200000

Chapter 4.45

```
.set GPFSELO, 0x00
          .set GPSETO, 0x1c
          .set GPCLRO, 0x28
          .set STBASE, 0x3F003000
.set STCLO, 0x04
.text
         mov r0, #0b11010011
         msr cpsr_c, r0
         mov sp, #0x08000000
                                  @ Init stack in SVC mode
          ldr r4, =GPBASE
         str r5, [r4, #GPFSEL0] @ Configure GPIO4
         ldr r0, =STBASE
ldr r1, =1136
                                 @ r0 is an input parameter (ST base address)
                                  \ensuremath{\text{@}} r1 is an input parameter (waiting time in microseconds)
bucle:
                                  @ Call waiting routine
         bl espera
          str r5, [r4, #GPSET0] @ Turn LED on
         bl espera
                                  @ Call waiting routine
          str r5, [r4, #GPCLR0] @ Turn LED off
         b bucle
espera:
         push {r4, r5}
                                  @ Save r4 and r5 in the stack
          ldr r4, [r0, #STCLO] @ Load CLO timer
                                  @ Add waiting time -> this is our ending time
          add r4, r1
ret1:
          ldr r5, [r0, #STCLO]
                                 @ Enter waiting loop: load current CLO timer
                                 @ Compare current time with ending time
@ If lower, go back to read timer again
@ Restore r4 and r5
          cmp r5, r4
         blo ret1
         pop {r4, r5}
          bx lr
                                  @ Return from routine
```

# **Exceptions**

Chapter 4.46

Dept. of Comp. Arch., UMA, 2018

## **Exceptions**

- "Unexpected" events requiring change in flow of instructions execution
  - Branch and Jumps are excluded (they are "expected changes)
- Two possible sources of exceptions
  - Internal exceptions
    - e.g., undefined opcode, overflow, syscall, ...
  - External exceptions → INTERRUPTS
    - From an external device (no memory)
- Dealing with them without sacrificing performance is hard

Chapter 4.47

## **Dealing with Exceptions**

- Different ISAs use the terms differently
  - Traps, exceptions, interrupts ...
    - i.e.: intel x86: exceptions and interrupt
- Convention:
  - Exception: any event (other than branches and jumps) that *changes the normal flow* of instructions
    - If it is an external event the exception is called Interrupt

Chapter 4.48

Dept. of Comp. Arch., UMA, 2018

## **Dealing with Exceptions**

- Exceptions are just another form of control hazard.
   Exceptions (Interrupts) arise from
  - Arithmetic overflow (internal, exc.)
  - Trying to execute an undefined instruction (internal, exc.)
  - An OS service request (e.g., a page fault) (internal, exc.)
  - A hardware malfunction (internal or external)
  - An I/O device request (external, int)
- Invoke the OS from the user program (internal, software int. or system call)
- The software (OS) /HW looks at the cause of the exception and "deals" with it

Chapter 4.49

## **Two Types of Exceptions**

- Internal exception synchronous to program execution
  - caused by internal events
  - condition must be remedied by the trap handler:
    - stop the offending instruction midstream in the pipeline
    - pass control to the OS trap handler
  - the offending instruction may be retried (or simulated by the OS) and the program may continue or it may be aborted

Chapter 4.50

Dept. of Comp. Arch., UMA, 2018

## **Two Types of Exceptions**

- External exceptions -> <u>Interrupts</u> asynchronous to program execution
  - caused by external events
  - may be handled between instructions:
    - let the prior instructions currently active in the pipeline complete
    - pass control to the OS interrupt handler
  - simply suspend and resume user program



Chapter 4.51

### **Interrupt Driven I/O**

- □ An I/O interrupt is asynchronous wrt instruction execution
  - Is not associated with any instruction so doesn't prevent any instruction from completing
    - You can pick your own convenient point to handle the interrupt
    - Control unit needs only check for a pending I/O interrupt at the time it starts a new instruction
- With I/O interrupts
  - Need a way to identify the device generating the interrupt
    - Vectored interrupts: the device can send a vector (id.) to the processor, which uses it to address the table of the interrupt vectors, from where it gets the address of the handle.
    - Non vectored interrupts: the device places a status field in the Cause register, jumps to a handler at a fixed direction.
    - Auto-vectored interrupts: each exception has vector associated to it.
    - When the handle gets control, it knows the identity of the device and can immediately start the I/O operation
  - Can have different urgencies (so need a way to prioritize them)
    - I/O interrupts have lower priority than internal exceptions
    - UNIX OS uses four to six levels
    - Interrupt priority levels (IPLs) assigned by the OS to each process can be raised and lowered via changes to the Status's Interrupt mask field
      - · Lowest ILP: all interrupts are permitted
      - · Highest ILP: all interrupts are blocked

Chapter 4.52

Dept. of Comp. Arch., UMA, 2018

## **Exceptions in ARM**

- ARM's exception system is auto-vectorized
  - There are 8 exception types, NI=0:7
  - Each NI has an exception vector associated to it
    - The exception vector is a jump to a handler
    - NI\*4 is the offset to the exception vectors table

| Exception           | Туре         | Offset | Mode          |
|---------------------|--------------|--------|---------------|
| Reset               | Interruption | 0x00   | SVC           |
| Undefined Instruct. | Exception    | 0x04   | Undefine<br>d |
| SW interrupt        | SW Interrup. | 80x0   | SVC           |
| Prefetch abort      | Exception    | 0x0C   | Abort         |
| Data abort          | Exception    | 0x10   | Abort         |
| Reserved            | -            | 0x14   | -             |
| IRQ                 | Interruption | 0x18   | IRQ           |
| FIQ                 | Interruption | 0x1C   | FIQ           |

Chapter 4.54

## **Exceptions in ARM**

- □ Type of exceptions:
  - Reset: pins in P6 fire a bootload
  - Undefined instruction: op. code not valid
  - Software interruptions: system calls
  - Prefecth abort /data abort: memory misalignment, access privilege errors
  - IRQ: interruptions due to external devices
  - FIQ: fast interruptions

Chapter 4.55

Dept. of Comp. Arch., UMA, 2018

## **Exception priorities**

■ When multiple exceptions arise at the same time, a fixed priority system determines the order that they are handled:

| Priority |   | Exception                                   |  |
|----------|---|---------------------------------------------|--|
| Higuest  | 1 | Reset                                       |  |
|          | 2 | Precise Data Abort                          |  |
|          | 3 | FIQ                                         |  |
|          | 4 | IRQ                                         |  |
|          | 5 | Prefetch Abort                              |  |
|          | 6 | Imprecise Data Abort                        |  |
| Lowest   | 7 | BKPT<br>Undefined Instruction<br>SVC<br>SMC |  |

Chapter 4.56





## **Handling exceptions**

- When an exception occurs, the core...
  - Copies CPSR into SPSR\_<mode>
  - Sets appropriate CPSR bits
    - Change to ARM state
    - Change to exception mode
    - Disable interrupts (if appropriate)
  - Stores the return address in LR\_<mode>
  - Sets PC to vector address
- To return, exception handler needs to...
  - Restore CPSR from SPSR\_<mode>
  - Restore PC from LR\_<mode>

IRQ B irq\_handler **Data Abort** 0x0C **Prefetch Abort** Supervisor Call **Undefined Instruction** 

0x1C

0x18

0x14

0x10

0x08

0x04

0x00

**Vector Table** 

Vector table can also be at 0xFFFF0000 on most cores

This can only be done in ARM state.

Dept. of Comp. Arch., UMA, 2018

## **Handling exceptions**

## Main Application dir inst Exception Χ i handler X+4 i+1 X+8

- 1. Save processor status
  - Stores PC in LR\_<mode>
    - Adjusts LR based on exception type
    - Stores X+8 → LR\_<mode>
  - Copies CPSR into SPSR\_<mode>

#### 2. Change processor status for exception

- Forces the CPSR mode bits to a value (depends on the exception)
- Sets PC to vector address
- 3. Execute exception handler
  - <user code>
- 4. Return to main application
  - Restore CPSR from SPSR\_<mode>
  - Restore PC: PC ← LR\_<mode> -4
- 1 and 2 performed automatically by the core
- 3 and 4 responsibility of software

Chapter 4.60

Chapter 4.59



## **Exception handler**

- Basic structure of a exception handler
  - Interruption: the return is done by Ir-4
  - Internal exception (as data abort): the return is done by Ir-8
  - User must manage A, I and F flags to disable/enable nesting of new exceptions and interruptions.
    - Initially the interruptions are disabled (I=F=1).

#### irq\_handler:

- Push registers to be used
- Source of interruption?
- 3 Perform handler work depending on 2
- Clear event (notify to device IRQ/FIQ has been served)
- 6 Pop registers
- 6 Return from handler: subs pc, 1r, #4

Chapter 4.62



## Main program: steps to set up the Interruptions

- 1 Initialize Vector Table (IRQ/FIQ) in the Vector Table
- Init the stack/s for FIQ/IRQ modes

```
sp_ifq <- 0x00004000
sp_irq <- 0x00008000</pre>
```

3 Init the stack for SVC mode (SVC mode selected)

```
sp_svc <- 0x08000000
```

- 4 Configure GPIOs (I&O)
- 6 Configure peripheral interruption: timer/push-buttons
- 6 Local enabling of configured interrupts
- Global enabling of interrupts (SVC mode)
- 8 ... (main program tasks)

Chapter 4.64

## Initialize Vector Table

■ To write in the Vector Table we can use a macro, ADDEXC, that computes the offset of the exception handler and writes the Vector in the Vector Table.

mov r0, #0 @Vector table base = 0
ADDEXC 0x18, irq\_handler

Chapter 4.65

Dept. of Comp. Arch., UMA, 2018

### macro ADDEXC offset, dirDest

- □ The IRQ handler is located at dirDest
- Vector table stores a branch instruction to the IRQ handler, b disp, located at offset (0x18)
- □ disp is the number of bytes between dirDest and offset, divided by 4
- □ While executing b disp, pc is incremented twice (pc = 0x18+8)
- □ Thus, disp must store

$$disp = \frac{\left(dirDest - (offset + 8)\right)}{4}$$



**Vector Table** 

Chapter 4.66





## **28** Initialize the stack

- □ Each mode has its stack pointer (sp)
  - Change the mode (via cpsr\_c)
  - Instructions msr (sr <- reg) and mrs (reg <-sr).
  - Initialize the corresponding sp register
- Initial state in BareMetal is SVC
  - sp\_fiq=0x4000, sp\_irq=0x8000, sp\_svc=0x08000000:

```
r0, #0 @ Pointer to vector table
ADDEXC 0x18, irq_handler
ADDEXC 0x1c, fiq_handler
      r0, #0b11010001 @ FIQ mode, FIQ and IRQ disabled
mov
msr
      cpsr_c, r0
      sp, #0x4000
mov
      r0, #0bl1010010 @ IRQ mode, FIQ and IRQ disabled
mov
      sp, #0x8000
      r0, #0bl1010011 @ SVC mode, FIQ and IRQ disabled
mov
      cpsr c, r0
msr
      sp, #0x08000000
mov
```

Chapter 4.69

Dept. of Comp. Arch., UMA, 2018

## **6** Configure peripheral interruption

□ GPIO interruption (push-buttons): use GPRENn, GPFENn, GPHENn, GPLENn, GPARENn y GPAFENn

System Timer: write the final count (microseconds) in STC1/STC3

```
Oconfigure timer IRQ

Idr r0, =STBASE

Idr r1, [r0, #STCLO]

add r1, #y @y microseconds

str r1, [r0, #STC1]
```

Chapter 4.70



## Source of interruption?

- In case of interruption, the handler must identify the source reading the IRQ pending ports
  - GPIO interruption detection: use GPEDSn.

 System Timer interrupt detection: STCS notifies interruption due to C0: C3 counters

```
© Source of timer interruption?:

Idr r0, =STBASE

Idr r2, [r0, #STCS]

ands r2, #0b0010 @C1?

...

Idr r2, [r0, #STCS]

ands r2, #0b1000 @C3?
```

Chapter 4.72







□ Three groups: pending, enable and disable



In each group:

- · IRQ basic: summary
- IRQs 1 and 2: in detail

There is also one port for FIQ control

Chapter 4.75















## **Example 7: Putting it all together**

- □ Turn on and off the red led at GPIO9 every 4 seconds or when the push button at GPIO2 is pressed.
- Use IRQ to handle the timer and FIQ to handle the push button.
- Use a variable in memory to control the led state (on or off)

Dept. of Comp. Arch., UMA, 2018

Chapter 4.84





# Direct Memory Access (DMA)

Chapter 4.87

#### **Direct Memory Access (DMA)**

- □ For high-bandwidth devices (like disks) polling or interrupt-driven I/O would consume a *lot* of processor cycles
- With DMA, the DMA controller has the ability to transfer large blocks of data directly to/from the memory without involving the processor
  - The processor initiates the DMA transfer by supplying the I/O device address (identity), the operation to be performed, the memory address destination/source, the number of bytes to transfer
  - 2. The DMA controller manages the entire transfer (possibly thousand of bytes in length), arbitrating for the bus
  - When the DMA transfer is complete (or in case of error), the DMA controller interrupts the processor to let it know that the transfer is complete
- □ There may be multiple DMA devices in one system
  - E.g.: systems with a single memory bus and multiple I/O buses, each I/O bus controller will often contain a DMA
  - Processor and DMA controllers contend for bus cycles and for memory
    - The processor can be delayed when the memory is busy doing a DMA transfer

Chapter 4.88







#### The DMA Stale Data or Coherence Problem

- In systems with caches, there can be two copies of a data item, one in the cache and one in the main memory
  - For a DMA input (from disk to memory) the processor will be using stale data if that location is also in the cache
  - For a DMA output (from memory to disk) and a write-back cache – the I/O device will receive stale data if the data is in the cache and has not yet been written back to the memory
- The coherency problem can be solved by
  - Routing all I/O activity through the cache expensive and a large negative performance impact
  - Having the OS invalidate all the entries in the cache for an I/O input or force write-backs for an I/O output (called a cache flush)
  - 3. Providing hardware to *selectively* invalidate cache entries i.e., need a snooping cache controller

Chapter 4.92 Dept. of Comp. Arch., UMA, 2018

#### **DMA and Virtual Memory Considerations**

- Should the DMA work with virtual addresses or physical addresses?
- If working with physical addresses
  - Must constrain all of the DMA transfers to stay within one page because if it crosses a page boundary, then it won't necessarily be contiguous in memory
  - If the transfer won't fit in a single page, it can be broken into a series of transfers (each of which fit in a page) which are handled individually and chained together
- If working with virtual addresses
  - The DMA controller will have to translate the virtual address to a physical address (i.e., will need a TLB structure)
- Whichever is used, the OS must cooperate by not remapping pages while a DMA transfer involving that page is in progress

Chapter 4.93 Dept. of Comp. Arch., UMA, 2018

#### More intelligent controllers: I/O processors

- □ To further reduce the need to interrupt the processor the I/O controller can be made more intelligent: I/O processors (I/O controllers or channel controllers)
  - they execute a series of I/O operations (I/O program is stored in the I/O processor or in memory and fetched by the I/O processor) and interrupts the processor only when the entire program is completed.
  - The I/O program is setted up by the OS: I/O operations to be done, the size and transfer address for any reads or writes
- DMA processors are essentially special-purpose processors (single-chip and nonprogrammable), while I/O processors are often implemented with generalpurpose microprocessors, which run a specialized I/O program

Chapter 4.94 Dept. of Comp. Arch., UMA, 2018

#### Interfacing I/O Devices to the Processor, Memory, and OS

- □ The operating system acts as the interface between the I/O hardware and the program requesting I/O
  - It provides equitable access to the shared I/O resources, protects those I/O devices/activities to which a user program doesn't have access, and schedules I/O requests to enhance system throughput
  - It handles interrupts generated by I/O devices
  - It supplies routines for low-level I/O device operations
    - OS must be able to give commands to the I/O devices
    - I/O device must be able to notify the OS about its status
    - Must be able to transfer data between the memory and the I/O device
- Software that communicates with an I/O device is called a device driver, and requires detailed knowledge about the I/O device hardware (ports list and its behavior)

Chapter 4.95 Dept. of Comp. Arch., UMA, 2018

# **PC I/O Systems**

Chapter 4.96

Dept. of Comp. Arch., UMA, 2018

## **I/O System Interconnect Issues**

- □ A bus is a shared communication link (a single set of wires used to connect multiple subsystems) that needs to support a range of devices with widely varying latencies and data transfer rates
  - Advantages
    - Versatile new devices can be added easily and can be moved between computer systems that use the same bus standard
    - Low cost a single set of wires is shared in multiple ways
  - Disadvantages
    - Creates a communication bottleneck bus bandwidth limits the maximum I/O throughput
- □ The maximum bus speed is largely limited by
  - The length of the bus
  - The number of devices on the bus

Chapter 4.97

#### **Synchronous and Asynchronous Buses**

- Synchronous bus
  - Includes a clock in the control lines and has a fixed protocol for communication that is relative to the clock.
    - can run very fast, but every device communicating on the bus must use same clock rate and fast ones cannot be long to avoid clock skew
- Asynchronous bus
  - It is not clocked, so requires a handshaking protocol (and additional control lines).
    - slower, but it can accommodate a wide range of devices and device speeds, and can be lengthened without worrying about clock skew or synchronization problems
- As it became difficult to run many parallel wires at high speed due to clock skew and reflection the industry transitioned from parallel shared buses to high-speed serial point-to-point interconnections with switches.

Chapter 4.98 Dept. of Comp. Arch., UMA, 2018

# **PC I/O Systems**

- Personal computers (PCs) use a wide variety of I/O protocols: memory, disks, networking, internal expansion cards, external devices.
  - They make easy to the user the addition of new devices and offer high performance at the expense of their complexity



- External peripherals (keyboards, webcams...): USB
- High performance cards (graphic cards): PCI Expressx16
- Lower-performance cards: PCI Expressx1 or older PCI slots
- Network: Ethernet jack
- Hard disk: SATA port

DRAM connects to the procesor over a synchronous parallel bus (more than one channel to allow simultaneous access to more than one memory bank)

Chapter 4.99